fix(chrome-ai): probe-gate caps + session/validation correctness (#514) by sroussey · Pull Request #520 · workglow-dev/libs

sroussey · 2026-05-20T09:01:38Z

Stacks fixes on top of #514's chrome-ai branch. Five focused commits — one per issue.

Summary

C1 — Probe-gate `tool-use` and `json-mode`

inferWebBrowserCapabilities unconditionally advertised json-mode + tool-use for chrome-prompt/gemini-nano, but LanguageModel.create's tools and LanguageModel.prompt's responseConstraint aren't universally accepted across Chrome builds. The dispatcher could route a json-mode/tool-use task to a provider that would reject it at runtime.

New providers/chrome-ai/src/ai/common/WebBrowser_CapabilityProbe.ts: one-shot probe with module-level promise coalescing; smoke-tests both options independently and immediately destroys the test sessions.
WebBrowserProvider constructor kicks off the probe and stores result on this.probedCaps. Provider exposes ready(): Promise<void>.
Pre-probe: conservative subset (no json-mode, no tool-use). Post-probe: reflects browser surface.
inferWebBrowserCapabilities(model, probed?) defaults to {jsonMode: true, toolUse: true} for back-compat; new inferWebBrowserCapabilitiesAsync drives the probe.
Files: providers/chrome-ai/src/ai/common/WebBrowser_Capabilities.ts, providers/chrome-ai/src/ai/WebBrowserProvider.ts, providers/chrome-ai/src/ai/index.ts.

H1 — `WebBrowser_StructuredGeneration` accepts `sessionId`

The run-fn dropped sessionId from its signature, so successive calls with the same id always rebuilt the underlying LanguageModel.

Accepts sessionId as the 6th positional param.
Cache reuse keys on a canonical schema fingerprint (recursive key-sorted stringify) stored on ChromeChatSessionState.schemaFingerprint.
Schema mismatch forces a rebuild (Chrome's responseConstraint state is bound at first prompt).
On stream failure: drop+destroy via the same cacheWritten/dropChromeSessionEntry dance as WebBrowser_Chat.
File: providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts (line ~41 signature), providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts (extended ChromeChatSessionState).

H2 — `WebBrowser_ToolCalling` accepts `sessionId`

Ignored both outputSchema and sessionId.

Accepts both 5th + 6th positional params.
Sorted-tool-name fingerprint stored on ChromeChatSessionState.toolsFingerprint. Tool-set change rebuilds.
Correctness guard: only cache when input.messages is present. Bare-prompt callers always rebuild because Chrome appends tool-result turns to the session's internal state opaquely — reusing a cache the orchestrator hasn't fully replayed would double-feed results. Documented in code.
On error: drop+destroy.
File: providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts (line ~104 signature).

H3 — Validate tool-call arguments against `inputSchema`

callInput = (args[0] ?? {}) was forwarded verbatim; filterValidToolCalls only checked the tool name.

Compile each tool's inputSchema once via compileSchema from @workglow/util/schema, cached by name.
Validate captured args before filterValidToolCalls. Invalid → drop + getLogger().warn(...) matching the existing name-only warning style.
Tools whose inputSchema fails to compile log once and fall through to name-only validation (no run-level crash).
File: providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts (line ~128 execute stub + new validator pass).

H4 — Validate StructuredGeneration final JSON against `outputSchema`

When JSON.parse failed AND parsePartialJson returned undefined, the run-fn cast {} to the output type, emitted a finish event, and downstream code had no way to distinguish that from a legitimate empty payload.

compileSchema at the top of the run; compile failure → PermanentJobError("invalid outputSchema") (avoids burning retry budget on a malformed schema).
Unparseable final string → PermanentJobError("Chrome AI returned unparseable JSON"). No finish emitted.
Parsed but invalid → PermanentJobError("Chrome AI output failed schema validation: ..."). No finish emitted.
Only on parse+validate success is finish emitted and the cache entry written.
File: providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts (lines ~94, ~96 of the original; reworked).
Retry contract: verified StructuredGenerationTask.executeStream (in packages/ai) wraps super.executeStream(currentInput, context) in a per-attempt for-await and validates per finish, so a thrown error correctly fails the attempt and the loop retries up to maxRetries. Throwing without emitting finish is the right shape.

Test plan

bun test packages/test/src/test/ai-provider/WebBrowserProvider.test.ts — 46 tests pass (up from 19).
- 14 inferCapabilities + probe coalescing tests.
- 4 SG session cache tests + 3 SG validation tests.
- 3 TC session cache tests + 4 TC argument validation tests.
tsgo --noEmit clean on providers/chrome-ai/ and packages/test/.
bunx vitest run packages/test/src/test/ai-provider — only failures are unrelated (HFT bbox unit test, llamacpp model download race) and reproduce on the base branch.
Manual smoke on a real Chrome build with the Prompt API enabled.
Manual smoke on a Chrome build without tools / responseConstraint (probe should gate them out).

Open questions

Probe surface: today the probe smoke-tests create({ responseConstraint }) and create({ tools }). Per spec responseConstraint actually lives on prompt() options, not create() — a build that accepts unknown create options silently could give us a false positive. The user brief explicitly asked for the create-time test for both, and reviewing the chromium types tools is a create-time option while responseConstraint is per-prompt. If we want a tighter signal we could additionally run a short promptStreaming with the constraint and read one chunk. Worth a follow-up.
H4 retry contract: StructuredGenerationTask.executeStream catches per-attempt errors from the run-fn? Inspected the task and confirmed it iterates per-attempt and validates on finish, so throwing without finish should retry. Could not run against a real failing model — please verify with a live Chrome AI smoke test.
Fingerprint storage: H1's schema fingerprint and H2's tools fingerprint live on ChromeChatSessionState. They're string-typed and unbounded — for very large schemas the canonical-stringify cost is non-trivial. If we see a hot path, hash to a fixed-length digest.
Worker bundle compatibility: @workglow/util/worker deliberately excludes compileSchema (json-schema-library + URI.js + nearley + json-pointer is heavyweight). H3/H4 import from @workglow/util/schema which pulls those in. bun build --packages=external keeps them external so worker startup cost grows only if the consumer actually imports. Worth confirming the worker bundle size delta is acceptable, or guarding the validation behind a no-op fallback when running in the worker entry.

Generated by Claude Code

Chrome's `LanguageModel.create` did not universally accept `tools` or `responseConstraint` options, yet `inferWebBrowserCapabilities` always advertised `tool-use` + `json-mode` for `chrome-prompt`/`gemini-nano`. This caused the dispatcher to route json-mode and tool-use tasks to the WebBrowser provider on Chrome builds that would reject them at runtime. Adds a one-shot capability probe (`probeWebBrowserCapabilities`) that smoke-tests `factory.create({ responseConstraint })` and `factory.create({ tools })`, with module-level coalescing so concurrent callers share one probe round-trip. `WebBrowserProvider` kicks the probe off in its constructor; until it resolves, `inferCapabilities` returns the conservative subset (no `json-mode`, no `tool-use`). Tests cover all four probe outcome combinations, coalescing, and pre/post-ready inference. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

…ingerprint (H1) The structured-generation run-fn dropped `sessionId` from its signature, so successive calls with the same id always rebuilt the underlying Chrome `LanguageModel` even though the surface supports session reuse. This matched the pre-session-cache behaviour rather than the post-cache shape adopted by `WebBrowser_Chat`. Accept `sessionId` as the 6th positional parameter, mirroring chat. Cache reuse is gated on a canonical schema fingerprint stored on the cache entry — a schema change forces a rebuild because Chrome's `responseConstraint` state is bound at first-prompt and re-feeding a different schema is undefined behaviour. On stream failure the entry is dropped + destroyed via the same `cacheWritten` / `dropChromeSessionEntry` dance as chat. `ChromeChatSessionState` grows an optional `schemaFingerprint` field. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

… (H2) `WebBrowser_ToolCalling` ignored both `outputSchema` and `sessionId` — the 5th and 6th positional parameters of the run-fn contract — so multi-turn tool-calling rebuilt the `LanguageModel` each turn. Accept both parameters. Cache reuse keys on a sorted-tool-name fingerprint (Chrome binds `tools` at `create()` time and can't hot-swap them per turn). We only cache when the orchestrator drives via `input.messages` because Chrome's tool-calling loop appends tool-result turns to the session's internal state opaquely — reusing a cached session across a turn the orchestrator hasn't fully replayed would double-feed those results. Bare-prompt callers always rebuild. On any error we drop + destroy the cache entry: Chrome's internal state may be mid-tool-call-cycle. `ChromeChatSessionState` grows an optional `toolsFingerprint` field. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

… (H3) Chrome's `LanguageModel` invokes our stub `execute` callback with whatever arguments the model emits. `filterValidToolCalls` only checked the tool name, so a hallucinated arg shape was forwarded to the orchestrator verbatim — leaving the downstream tool runner to either fail or silently produce garbage. Compile each tool's `inputSchema` once via `compileSchema` (cached by name) before the stream starts. After streaming we validate every captured call's `input` against its tool's validator; failures are dropped + warn-logged in the same shape as `filterValidToolCalls`'s existing name-only warning. Tools whose `inputSchema` fails to compile emit a single warning and fall through to the name-only check rather than failing the whole run. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

…ma (H4) Chrome's `responseConstraint` is best-effort, not a hard guarantee — the model can still produce a partial or shape-mismatched payload. The existing fallback (`parsePartialJson(...) ?? {}`) handed downstream code an empty object cast to the output type, indistinguishable from a legitimate empty payload. Worse, that path emitted a `finish` event, so `StructuredGenerationTask`'s retry loop had no signal to retry on. Compile the validator once via `compileSchema`. After streaming: - If neither `JSON.parse` nor `parsePartialJson` produces a value: throw `PermanentJobError("Chrome AI returned unparseable JSON")`. - If validation fails: throw with the first validator error message. - Only on success do we emit `finish` and write the cache entry. `StructuredGenerationTask.executeStream` catches per-attempt errors and retries, so throwing here is the correct signal — no `finish` so the loop knows this attempt failed. Schema compile failures are also surfaced as `PermanentJobError` (so retries don't burn through quota on a malformed schema). https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

pkg-pr-new · 2026-05-20T09:03:30Z

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@520

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@520

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@520

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@520

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@520

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@520

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@520

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@520

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@520

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@520

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@520

@workglow/util

npm i https://pkg.pr.new/@workglow/util@520

workglow

npm i https://pkg.pr.new/workglow@520

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@520

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@520

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@520

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@520

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@520

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@520

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@520

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@520

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@520

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@520

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@520

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@520

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@520

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@520

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@520

commit: b6e3cfe

github-actions · 2026-05-20T09:05:29Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	62.17%	22881 / 36801
🔵	Statements	62.04%	23673 / 38152
🔵	Functions	63.14%	4310 / 6826
🔵	Branches	50.74%	11100 / 21876

File Coverage

No changed files found.

Generated in workflow #2313 for commit b6e3cfe by the Vitest Coverage Report Action

Copilot

Pull request overview

This PR hardens the @workglow/chrome-ai provider by (1) probing Chrome Prompt API feature support before advertising json-mode / tool-use, and (2) fixing session reuse + schema validation correctness for Structured Generation and Tool Calling run functions.

Changes:

Add a module-level capability probe (coalesced) and wire it into WebBrowserProvider with a ready() hook and conservative pre-probe capability inference.
Fix sessionId handling and cache invalidation rules for WebBrowser_StructuredGeneration and WebBrowser_ToolCalling, including schema/toolset fingerprinting.
Add schema validation for Tool Calling args (inputSchema) and Structured Generation final JSON (outputSchema), plus expand provider test coverage substantially.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
providers/chrome-ai/src/ai/WebBrowserProvider.ts	Kicks off capability probing in the constructor; exposes `ready()`; gates inferred capabilities using probed results.
providers/chrome-ai/src/ai/index.ts	Extends `_testOnly` exports for probe helpers and run-fns to support new tests.
providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts	Adds `sessionId` support, toolset fingerprinting + caching rules, and validates tool-call args against each tool’s `inputSchema`.
providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts	Adds `sessionId` support, schema fingerprinting + caching, and validates final JSON against `outputSchema` with PermanentJobError failures.
providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts	Extends cached session state to store schema/tool fingerprints alongside the session + message watermark.
providers/chrome-ai/src/ai/common/WebBrowser_CapabilityProbe.ts	New probe module that smoke-tests optional Chrome Prompt API surfaces and caches the result.
providers/chrome-ai/src/ai/common/WebBrowser_Capabilities.ts	Updates capability inference to conditionally include `json-mode` / `tool-use`; adds async inference helper.
packages/test/src/test/ai-provider/WebBrowserProvider.test.ts	Adds extensive tests covering probe behavior/coalescing, caching correctness, and schema validation behaviors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…apability probe Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's parallel chrome-ai work (model.download, model.dispose, ApiBinding): - Chat-session cache keyed by AiChatTask sessionId, with messageCount high-water mark for reuse (replaces fingerprint-based invalidation) - StructuredGeneration + ToolCalling run-fns gated by an async capability probe; pre-probe state advertises a conservative subset (no json-mode, no tool-use) so the provider never claims a capability it can't fulfil - ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher (text.generation shared by AiChatTask + TextGenerationTask) - ChromeHelpers ships both assertAvailability and ensureAvailable; both session APIs (chrome-chat cache + idle-evict store) coexist - Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test already covers chat behavior under the new cache semantics)

…viders Addresses review of #514/#520/#528 rebase: CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The post-rebase chrome-ai branch had two parallel session maps (`chromeSessions` for chat reuse, `sessions` for idle-evict + ModelDispose lookup) but only the chat map was populated by runtime code, making `model.dispose` a functional no-op in production. Unified into a single Map<sessionId, WebBrowserSessionEntry> with both chat-cache fields (messageCount, fingerprints) and lifecycle fields (modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires `modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the unified store, so model.dispose destroys chat-cached sessions. Chat sessions become subject to idle eviction (free bonus). IMPORTANT — sanitizeToolArgs applied across the codebase per intent of the prior refactor: - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream) → covers OpenAI + HFI - ToolCallParsers (adaptParserResult + parseToolCallsFromText) → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT - Anthropic_ToolCalling (input_json_delta + content_block_stop) - Gemini_ToolCalling (functionCall.args) - Ollama_ToolCalling (parsed function.arguments) - LlamaCpp_ToolCalling (extractNativeFunctionCalls) - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths) Every model-supplied tool-arg payload now passes through sanitizeToolArgs before reaching downstream consumers, closing the prototype-pollution vector across the provider matrix. Also: - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit tests for sanitizeToolArgs, compileToolValidators, validateToolCallArgs, plus a sanitize→validate→name-check integration test). - Added WebBrowser_Sessions.test regression for the unified-store behavior (disposeWebBrowserSessionsForModel sees chat-cached entries). - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model (vs the in-fn retry that main's now-deleted test exercised).

* feat(chrome-ai): chat history, tool calling, structured generation, capability probe Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's parallel chrome-ai work (model.download, model.dispose, ApiBinding): - Chat-session cache keyed by AiChatTask sessionId, with messageCount high-water mark for reuse (replaces fingerprint-based invalidation) - StructuredGeneration + ToolCalling run-fns gated by an async capability probe; pre-probe state advertises a conservative subset (no json-mode, no tool-use) so the provider never claims a capability it can't fulfil - ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher (text.generation shared by AiChatTask + TextGenerationTask) - ChromeHelpers ships both assertAvailability and ensureAvailable; both session APIs (chrome-chat cache + idle-evict store) coexist - Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test already covers chat behavior under the new cache semantics) * refactor(ai,chrome-ai,openai,hfi): shared tool sanitation; emit-pattern streams Tool calling utilities (packages/ai/src/task/ToolCallingUtils.ts): - sanitizeToolArgs: recursive __proto__/constructor/prototype scrubbing for model-supplied tool args (prototype-pollution defence) - compileToolValidators + validateToolCallArgs: per-tool inputSchema validation with graceful fallback for tools whose schema fails to compile Stream helpers converted from generators to emit-callback so run-fns no longer need a for-await/yield pump: - snapshotStreamToTextDeltas / snapshotStreamToSnapshots (chrome-ai) - accumulateOpenAIStream (@workglow/ai provider-utils, used by OpenAI + HFI) Run-fns updated to call helpers with emit directly and emit their own final 'finish' event. chrome-ai's WebBrowser_ToolCalling drops its private sanitization + validation copy and reuses the shared utils. * fix(chrome-ai): wire model.dispose; apply sanitizeToolArgs across providers Addresses review of #514/#520/#528 rebase: CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The post-rebase chrome-ai branch had two parallel session maps (`chromeSessions` for chat reuse, `sessions` for idle-evict + ModelDispose lookup) but only the chat map was populated by runtime code, making `model.dispose` a functional no-op in production. Unified into a single Map<sessionId, WebBrowserSessionEntry> with both chat-cache fields (messageCount, fingerprints) and lifecycle fields (modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires `modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the unified store, so model.dispose destroys chat-cached sessions. Chat sessions become subject to idle eviction (free bonus). IMPORTANT — sanitizeToolArgs applied across the codebase per intent of the prior refactor: - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream) → covers OpenAI + HFI - ToolCallParsers (adaptParserResult + parseToolCallsFromText) → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT - Anthropic_ToolCalling (input_json_delta + content_block_stop) - Gemini_ToolCalling (functionCall.args) - Ollama_ToolCalling (parsed function.arguments) - LlamaCpp_ToolCalling (extractNativeFunctionCalls) - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths) Every model-supplied tool-arg payload now passes through sanitizeToolArgs before reaching downstream consumers, closing the prototype-pollution vector across the provider matrix. Also: - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit tests for sanitizeToolArgs, compileToolValidators, validateToolCallArgs, plus a sanitize→validate→name-check integration test). - Added WebBrowser_Sessions.test regression for the unified-store behavior (disposeWebBrowserSessionsForModel sees chat-cached entries). - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model (vs the in-fn retry that main's now-deleted test exercised). * feat(chrome-ai): retry once on InvalidStateError when a cached session is destroyed Chrome can destroy a `LanguageModel` session out from under us (tab backgrounding, GPU process restart, memory pressure). When a cached session's `promptStreaming` throws DOMException("...destroyed...", "InvalidStateError") we now rebuild the session from full history via `initialPrompts` and retry the prompt once. Retry is gated on three conditions, all required: - We were using a CACHED session (a fresh-session failure means the model is broken; retrying won't help). - No text-delta has reached the consumer yet (we can't unsend deltas). - The error name is `InvalidStateError` (matches Chrome's InvalidStateError DOMException; tolerant of message-text changes). Tests: - "retries once with a fresh session when a cached session is destroyed" seeds the cache on turn 1, has the cached session's promptStreaming throw on turn 2's reuse, asserts rebuild + retry + cache replacement. - "does not retry when a fresh (non-cached) session fails" guards the first gate.

claude added 5 commits May 20, 2026 08:55

sroussey self-assigned this May 20, 2026

sroussey requested a review from Copilot May 20, 2026 15:03

Copilot started reviewing on behalf of sroussey May 20, 2026 15:04 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts Outdated

Potential fix for pull request finding

7135100

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

sroussey merged commit 917c34f into chrome-ai May 20, 2026
2 of 3 checks passed

sroussey deleted the claude/libs-514-fixes-Bi1rh branch May 20, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chrome-ai): probe-gate caps + session/validation correctness (#514)#520

fix(chrome-ai): probe-gate caps + session/validation correctness (#514)#520
sroussey merged 6 commits into
chrome-aifrom
claude/libs-514-fixes-Bi1rh

sroussey commented May 20, 2026

Uh oh!

pkg-pr-new Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sroussey commented May 20, 2026

Summary

C1 — Probe-gate tool-use and json-mode

H1 — WebBrowser_StructuredGeneration accepts sessionId

H2 — WebBrowser_ToolCalling accepts sessionId

H3 — Validate tool-call arguments against inputSchema

H4 — Validate StructuredGeneration final JSON against outputSchema

Test plan

Open questions

Uh oh!

pkg-pr-new Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

C1 — Probe-gate `tool-use` and `json-mode`

H1 — `WebBrowser_StructuredGeneration` accepts `sessionId`

H2 — `WebBrowser_ToolCalling` accepts `sessionId`

H3 — Validate tool-call arguments against `inputSchema`

H4 — Validate StructuredGeneration final JSON against `outputSchema`